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Abstract 

A sequence of reversals that takes a signed permutation to the identity is perfect if at no 
step a common interval is broken. Determining a parsimonious perfect sequence of reversals that 
sorts a signed permutation is NP-hard. Here we show that, despite this worst-case analysis, with 
probability one, sorting can be done in polynomial time. Further, we find asymptotic expressions 
for the average length and number of reversals in commuting permutations, an interesting sub- 
class of signed permutations. 

1 Introduction 

The sorting of signed permutations by reversals is a simple combinatorial problem with a direct 
application in genome arrangement studies. Different sorting scenarios provide estimates for evolu- 
tionary distance and can help explain the differences in gene orders between two species (see [9J for 
example). Initially, the shortest sequences (parsimonious) of reversals were sought, and polynomial 
time algorithms to find such sequences were described ([HllHllIH]). Recently, biologically motivated 
refinements have been considered, specifically accounting for groups of genes that are co-localized 
with the different homologous genes (genes having a single common ancestor) in the genomes of 
different species. These groups are likely together in the common ancestral genome, and were not 
disrupted during evolution, hence, we expect them to appear together at every step of the evolution. 
In terms of our combinatorial model, a group of co-localized genes is modeled by a common interval, 
that is, a collection of sequential numbers that are not broken by any reversal move. This constraint 
leads us back to the basic algorithmic problem: 

What is the smallest number of reversals required to sort a signed permutation into the 
identity permutation without breaking any (subset of) common interval? 

These scenarios are called perfect [H]. Because of the additional constraint, it is possible that 
the smallest perfect sorting scenario is longer that the smallest scenario. 

Already it is known that this refined problem is NP-hard [11]. However, several authors have 
given sub-instances which can be solved in polynomial time [HllUlTO], and fixed parameter tractable 
algorithms exist [4l[5]. For example, commuting permutations are the sub-class with the striking 
property that the property of a scenario being perfect is preserved even when the sequence of 
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reversals is reordered. Examples of commuting scenarios arise in the study of mammals. All of 
the known sub-problems can be expressed in terms of the "strong interval tree" associated to a 
permutation, and we focus our attention on the structure of this tree. 

Recently, several works have investigated expected properties of combinatorial objects related to 
genomic distance computation, such as the breakpoint graph [20l ETJ [191 [l7] . We explore this route 
here, but focusing on the strong interval tree, to conduct an average case analysis of perfect sorting 
by reversals. First, in Section [3l we prove that for large enough n, with probability 1, computing 
a perfect reversal sorting scenario for signed permutations can be done in time polynomial in n, 
despite the fact that this is NP-hard. Secondly, in Section lU we show that in parsimonious perfect 
scenarios for commuting permutations of length n, the average number of reversals is asymptotically 
1.2n, and the average length of a reversal is 1.02-y/n. 

2 Preliminaries 

We first summarize the combinatorial and algorithmic frameworks for perfect sorting by reversals. 
For a more detailed treatment, we refer to f4]. 

Permutations, reversals, common intervals and perfect scenarios. A signed permutation 
on [n] is a permutation on the set of integers [n] = {1, 2, . . . , n} in which each element has a sign, 
positive or negative. Negative integers are represented by placing a bar over them. We denote by 
Idn (resp. Idn) the identity (resp. reversed identity) permutation, (1 2 . . . n) (resp. (n...2 1)). 
When the number n of elements is clear from the context, we will simply write Id or Id. 

An interval / of a signed permutation a on [n] is a segment of adjacent elements of a. The 
content of / is the subset of / defined by the absolute values of the elements of /. Given cr, an 
interval is defined by its content and from now, when the context is unambiguous, we identify an 
interval with its content. 

The reversal of an interval of a signed permutation reverses the order of the elements of the 
interval, while changing their signs. If a is a permutation, we denote by a the permutation obtained 
by reversing the complete permutation a. A scenario for cr is a sequence of reversals that transforms 
a into Idn or /d„. The length of such a scenario is the number of reversals it contains. The length 
of a reversal is the number of elements in the interval that is reversed. 

Two distinct intervals / and J commute if their contents trivially intersect, that is either / C J, 
or J C /, or I n J = 0. If intervals / and J do not commute, they overlap. A common interval of 
a permutation a on [n] is a subset of [n] that is an interval in both a and the identity permutation 
Idn- The singletons and the set {1,2,... ,n} are always common intervals called trivial common 
intervals. 

A scenario S for a is called a perfect scenario if every reversal of S commutes with every common 
interval of a. A perfect scenario of minimal length is called a parsimonious perfect scenario. 

A permutation a is said to be commuting if, there exists a perfect scenario for a such that for 
every pair of reversals of this scenario, the corresponding intervals commute. In such a case, this 
property holds for every perfect scenario for cr [4j. 

The strong interval tree. A common interval / of a permutation cr is a strong interval of cr if 
it commutes with every other common interval of cr. 
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Figure 1: The strong interval tree Ts{a) of the permutation a = 

(1 8 4 2 5 3 9 6 7 12 10 14 13 fl 15 17 16 18). Prime and hnear vertices are distinguished 
by their shape. There are three non-trivial linear vertices, the rectangular vertices, and three prime 
vertices, the round vertices. The root and the vertex {6, 7} are increasing linear vertices, while the 
linear vertices {16, 17} and {13, 14} are decreasing. 



The inclusion order of the set of strong intervals defines an n-leaf tree, denoted by T!s(cr), whose 
leaves are the singletons, and whose root is the interval containing all elements of the permutation. 
The strong interval tree of a can be computed in linear time and space (see [7] for example). We call 
the tree Ts{cr) the strong interval tree of a, and we identify a vertex of T!s(o") with the strong interval 
it represents. In a more combinatorial context, this tree is also called substitution decomposition 
tree pj. If o" is a signed permutation, the sign of every element of a is given to the corresponding 
leaves in Ts{a). 

Let I be a strong interval of a and I = (/i, . . . , I^) the unique partition of the elements of / into 
maximal strong intervals, from left to right. The quotient permutation of /, denoted cx/, is defined 
as follows: ai{i) is smaller than (T/(j) in ct/ if any element of Ij is smaller (in absolute value if a 
is a signed permutation) than any element of Ij. The vertex /, or equivalently the strong interval 
/ of a, is either: increasing linear, if aj is the identity permutation, or decreasing linear, if ai is 
the reversed identity permutation, or prime, otherwise. For exposition purposes we consider that 
an increasing vertex is positive and a decreasing vertex is negative. The strong interval tree as 
computed in the algorithm of [7j contains the nature -increasing/decreasing linear or prime- of each 
vertex. It can be adapted to compute also in linear time the quotient permutation associated to 
each strong interval. (See Fig. [T] for an example.) 

For a vertex / of Ts{cr), we denote by L{I) the set of elements of a that label leaves of the 
subtree of Ts{cr) rooted at /. 

The strong interval tree as a guide for perfect sorting by reversals. We describe now 
important properties, related to the strong interval tree, of the algorithm described in [4j for perfect 
sorting by reversals a signed permutation. Let a be a signed permutation of size n and Ts{cr) its 
strong interval tree, having m internal vertices, called Ii, . . . ,/m, including p prime vertices: 

Theorem 1. 

1. The algorithm described in ^ can compute a parsimonious perfect scenario for a in worst-case 
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time OiT'n^Jn log(n)). 



2. a is a commuting permutation if and only if p = 0. 

3. If a is a commuting permutation, then every perfect scenario has for reversals set the set 
{L{Ij)\Ij has a sign different from its parent in Ts{(t)^ 



Remark 1. The strong interval tree of an unsigned permutation is equivalent to the modular decom- 
position tree of the corresponding labeled permutation graph (see ^^/or example). Also commuting 
permutations have been investigated, in connection with permutation patterns, under the name of 
separable permutations 



3 On the number of prime vertices 

Motivated by the average-time complexity of the algorithm described in [4] for computing a par- 
simonious perfect scenario, we first investigate the average shape of a strong interval tree of a 
permutation of size n. Such a tree is characterized by the shape of the tree along with the quotient 
permutations labeling internal vertices. For prime vertices, those quotient permutations correspond 
to simple permutations as defined in [2]. We first concentrate on enumerative results on simple per- 
mutations. Next, we derive from them enumerative consequences on the number of permutations 
whose strong interval tree has a given shape. Exhibiting a family of shapes with only one prime 
vertex, we can prove that nearly all permutations have a strong interval tree of this special shape. 



3.1 Combinatorial preliminaries: strong interval trees and simple permutations 

Let Ts{a) be the strong interval tree of a permutation a of length n. From a combinatorial point 
of view it is simply a plane tree (the children of a vertex are totally ordered) with n leaves and 
its internal vertices labeled by their quotient permutation: an internal vertex having k children 
can be labeled either by the permutation (12 ... k) (increasing linear vertex), the permutation 
{k k—1 ... 1) (decreasing linear vertex) or a permutation of length k whose only common intervals 
are trivial (prime vertex). Due to the fact that T!s(cr) represents the common intervals between a 
and the identity permutation, it has two important properties. 

Property 1. 1. No edge can he incident to two increasing or two decreasing linear vertices. 

2. The labeling of the leaves by the integers {!,..., n} is implicitly defined by the labels of the 
internal vertices. 

Permutations whose common intervals are trivial are called simple permutations. The shortest 
simple permutations are of length 4 and are (3 14 2) and (2413). The enumeration of simple 
permutations was investigated in [2]. The authors prove that this enumerative sequence is not P- 
recursive and there is no known closed formula for the number of simple permutations of a given 
size. However, it was shown in p] that an asymptotic equivalent for the number Sn of simple 
permutations of size n is 



n' 4 2 1 

^(1 - - + — + 0{^)) when n ^ oo 

n n{n — \) 
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3.2 Average shape of strong interval trees 

A twin in a strong interval tree is a vertex of degree 2 such that each of its two children is a leaf. 
A twin is then a linear vertex. The following result, that applies both to signed permutations and 
unsigned permutations, is the main result of this section. 

Theorem 2. Asymptotically, with probability I, a random permutation a of size n has a strong 
interval tree such that the root is a prime vertex and every child of the root is either a leaf or a twin. 
Moreover the probability that Tg{a) has such a shape with exactly k twins is 

The proof follows from Lemma [T] and Equation [T] 

Lemma 1. Ifp'^k denotes the number of permutations of length n which contain a common interval 
I of length k then for any fixed positive integer c: 

n-c I 
fc=c+2 

Proof. This Lemma generalizes to any common interval the following result. 

Lemma 2. [2| Lemma 7] A common interval in a permutation is said minimal if it is not a singleton 
and each common interval included in it is trivial. If pn,k denotes the number of permutations of 
length n which contain a minimal common interval of length k then for any fixed positive integer c: 

n—c 
k=c+2 

The proof of LemmaHJis very similar to the article [2]. We have p'ni^< {n — k + l)k\{n — k + 1)!. 
Indeed, the right hand side counts the number of quotient permutations corresponding to / (/c!), the 
possible values of the minimal element of I {n — k+1) and the structure of the rest of the permutation 
with one more element which marks the insertion of / {{n — k+\)\). Only the extremal terms of the 
sum can have magnitude 0{n~'^) and the remaining terms have mag nitude 0{n~''-^). Since there 
are fewer than n terms the result of Lemma [T] follows. 

□ 

Proof of Theorem\M Lemma[T]with c = 1 gives that the proportion of non-simple permutations with 
common intervals of size greater than or equal to 3 is 0(n~^). But permutations whose common 
intervals are only of size 1,2 or n are exactly permutations whose strong interval tree has a prime 
root and every child is either a leaf or a twin. 

Then the number of permutations whose strong interval tree has a prime root with k twins is 
Sn-ki^~^^^2^ . From Equation [1] the asymptotics for this number is proving Theorem [2l 

□ 

3.3 Average time complexity of perfect sorting by reversals 

Corollary 1. The algorithm described in 14J for computing a parsimonious perfect scenario for a 
random permutation runs in polynomial time with probability 1 as n —> oo. 
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Figure 2: up to n = 25. 



Proof. Direct consequence of point 1 in Theorem [T] and of Theorem O applied on signed permuta- 
tions. 

□ 

This result however does not imply that the average complexity of this algorithm is polynomial, 
as the average time complexity is the sum of the complexity on all instances of size n divided by 
the number of instances. Formally, to assess the average time complexity, we need to prove that as 
n grows, the ratio 



Pn 



Tn 



is bounded by a polynomial in n, where Tn is the number of strong interval trees with n leaves and 
Tn^p the number of such trees with p prime vertices. 

Let T{x, y) be the bivariate generating function T{x, y) = „ Tn^px'^y^ Then pn = [x'^]T{x, 2). 
Let moreover P{x) be the generating function of simple permutations P{x) = X^„>o SnX^ (whose 
first terms can be obtained from entry Allllll in [IS]). Using the specification for strong interval 
trees given in Section [3TT] and techniques described in [12] for example, it is immediate that T{x,y) 
satisfies the following system of functional equations: 

(T{x,y) = x + yP{T{x,y)) + 2^^ 
\Bix,y) = x + yPiTix,y)) + ^^ 

By iterating these equations, we computed the 25 first values of pn (Fig. [2]) that suggest that pn is 
even bounded by a constant close to 2 and lead us to Conjecture [H 

Conjecture 1. The average-time complexity of the algorithm described in f4^j for computing a 
parsimonious perfect scenario is polynomial, bounded by Oin^fn). 



4 Average-case properties of commuting permutations 

We now study the family of commuting (signed) permutations and more precisely the average 
number of reversals in a parsimonious perfect scenario for a commuting permutation and the average 
length of a reversal of such a scenario. 

Let o" be a commuting permutation of size n, i.e. a signed permutation whose strong interval 
tree Ts{cr) has no prime vertex. It follows from the combinatorial specification of strong interval 
trees given in Section [STTl that T!s(cj) is simply a plane tree with internal vertices having at least two 
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children and a sign on the root (that defines imphcitly the signs of the other internal vertices from 
point 1 in Property 1 and the labels {1 . . . n} of the leaves). These trees are then Schroder trees 
(entry A001003 in the On-Line Encyclopedia of Integer Sequences [TB]) with a sign on the root. 

Theorem 3. The average length of a parsimonious perfect scenario for a commuting permutation 
of length n is asymptotically 

1 + \/2 

n ~ 1.2n. 

2 

Proof. Prom the previous section and points 2 and 3 in Theorem [H the problem of computing the 
expected number of reversals of a parsimonious perfect scenario reduces to computing the expected 
number of internal vertices of Ts{a) other than the root (because two adjacent linear vertices cannot 
have the same sign) and the expected number of leaves whose sign in a differs from the sign of its 
parent in Ts{(t). 

The expected number of leaves whose sign in a is different from its parent in Ts{a) is obviously 
n/2, as the sign of the leaf and of its parent are independent. 

To compute the average number of internal vertices in a Schroder tree, we use symbolic methods 
as defined in [12]. Let us define the bivariate generating function S{x,y) = Ylk n^n,kX^y^ where 
Sn,k denotes the number of Schroder trees with n leaves and k internal vertices. The average number 
of internal vertices in a Schroder tree with n leaves is 

, Q r^nl dS{x,y) i 

EkSn^k ~ [X«]5(X,1) ■ 

A Schroder tree can be recursively described as a single leaf, or a root having at least two 
children, which are again Schroder trees. Consequently, S{x, y) satisfies the equation 

S{x,y) =x + y- — -, 

1 - S{x,y) 

and solving this equation gives 



at ^ (:r + 1) - V(x + 1)2 - 4x{y + 1) 

S{x,y) = . (2) 

We compute an asymptotic equivalent of the number [x"]S'(x, 1), the number of Schroder trees 
m entry A001003]). 



Asymptotic study of S{x,l). By Equation [2] we obtain 



^, (x+1)- \/(x + l)2-8x _ (^ + ^)~ v(^~iTf^)(^~i4^) 

which yields the equivalent when x — > 3 — 2-v/2, x < 3 — 2\/2 



5(x,l)~^-^^(l ^)V2. 

^ ^ 2 2 ^ 3-2^2 

Applying the techniques of [121 chapters 4 and 6] gives the following equivalent of the coefficients 
[x"]S'(x, 1) when n — > oo: 

n5(x,l)~^^^^(3 + 2^/2)'^ ^ 
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Asymptotic study of ^^^j^\y=i. By Equation [2] we obtain 



dS{x, y) _{x- 1)2 - (x + I)yj{x + IY -^x 



Prom the above expression, we can obtain an equivalent of ^^|fe^L=i when a; ^ 3 — 2\/2, 



a; < 3 - 2^/2. Namely, 

rJ.ST.T.i/V 3-2^/2 



dS{x,y) _ 3-2^2 x 
dy '"'^4^3^2-4 3-2^/2^ 



As before, we deduce that an equivalent of the coefficients [x"] ^^^'^^ \ y=i when n — > oo is 



dS{x,y) 3-2V2 ^ 1 

5y 4x/3V2-4 ^ 



7rn 

An equivalent of the average number of internal vertices in a Schroder tree with n leaves is now 
easily derived as 

[^"]^l.=i 3-2V2 n 

~ n ~ . 

3\/2-4 V2 

Combining all results together The number above is the the average number of internal vertices 
in Schroder trees with n leaves, including the root if it is not a leaf (i.e. n > 1). A given Schroder 
tree with n leaves can have its internal vertices and leaves signed in 2"+^ ways (2 choices for the 
sign of the root, that define the signs of all other internal vertices, and 2" choices for the signs of 
the n leaves). As these signs do not change the number of internal vertices of the tree, the average 
number of internal vertices in such signed Schroder trees does not change. We also have to discard 
the root as it does not define a reversal, but this does not change the asymptotic behaviour and 
adding n/2 to account for signed leaves that define reversals, we obtain 

□ 

Remark 2. It is interesting to note the large representation of reversals of length 1, that composes 
almost half of the expected reversals. A similar property was observed in fl5^ on datasets of bacterial 
genomes. 

Theorem 4. The average length of a reversal in a parsimonious perfect scenario for a commuting 
permutation of length n is asymptotically 



— \Jim ~ 1.02vn 

l + \/2 
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Proof. We want to compute the ratio between the average sum of the lengths of the reversals of 
a parsimonious perfect scenario for a commuting permutation and the average length of such a 
scenario. The later was obtained above (Theorem [3]), and we concentrate on the former. 

A reversal defined by a vertex x of the strong interval tree Ts{a) is of length L{x) (it reverses 
the segment of the signed permutation that contains the leaves of the subtree rooted at x, see [4j). 
We first focus on the average value of the sum of the sizes of all subtrees in a Schroder tree. For 
simplicity in the computation, we will also count the whole tree and the leaves as subtrees (obviously 
of size 1), which will give the same quantity we want to compute, up to subtracting 3/2 • n to the 
final result. We first define the bivariate generating function (that we call again S, but which is 
slightly different) following the standard analytic method defined in [12] 

5(x,y) = ^S„,fcxV 

k,n 

where Sn,k denotes the number of Schroder trees with n leaves and sizes of subtrees (including leaves 
and the whole tree) that sum to k. The average value of the sum of the sizes of every subtree in a 
Schroder tree with n leaves is 

UQ r^nl dS{x,y) i 

Z^fc kbn,k _ i dy \y=^ 

EkSn,k ~ [X^]S{X,1) ■ 

A Schroder tree can be recursively described as a single leaf or a root having at least two children, 
which are again Schroder trees. In the second case, the subtrees are those involved in the children 
of the root, plus the tree itself (which is a subtree of size n), which gives the functional equation [3l 

r./ N , S{xy,yf 

^("'^^="^+ l-5(x,,,) - 

Since this equation involves both S{x,y) and S{xy,y), we cannot extract from it an expression 
for S{x,y) as in the proof of Theorem [3l But since the average value of the sum of the sizes of 

every subtree in a Schroder tree with n leaves can be obtained by s"t ~ — [x'^]S{x i) — ' ^'^ 
no need to compute S{x,y) but only 5(x, 1) and ^^^^\y=i- 

Asymptotic study of S{x,l). By Equation [3] we obtain S{x,l) = (^+^) V(^+^) ^ which is 
the same function as in the proof of Theorem [3l 
Hence, 

[x"]5(x, 1) ^ ~ ^ (3 + 2^/2)" ^ 



dS{x,y) I 



Asymptotic study of — ^j^\y=i. Deriving Equation [3] and setting y = 1 gives: 

dS(^-,\_',,dS(^-,\ 2S{x,l)-Six,l f 



f (X, 1) = X + (xf (X, 1) + f (X, 1)) . ^f^^ 

From this system, we can extract the following equation where S{x, 1) has been computed before: 
dSix,y) ^ dS, ^, X , ^ 2S(x,l)-g(x,l)^ 



\2 
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The singularity closest to the origin is 3 — 2\/2, and the Taylor development of the above around 
this singularity gives: 

dS{x,y) 3-2^2 



Applying the techniques of [I2], this yields the following equivalent of the coefficients [2;"]^^^^' 



when n 00: 

dS{x^ 3-2V2 
^ J — 0^ — l?/=i 2 (3 + 2v2j 

Then 

r n] dS{x,y) 1 

gives the average sum of the sizes of all subtrees of a Schroder tree. 

This is independent of the signs added to give the strong interval tree of a commuting permu- 
tation, so this number is also the expected sum of the sizes of all subtrees of a the strong interval 
tree associated to a random commuting permutation. To get the expected sum of the lengths of 
the reversals of a parsimonious perfect scenario for a random commuting permutation, we need to 
remove the size of the whole tree, that was counted as a subtree (n), the size of the n subtrees 
defined by the leaves (n) and to add the contribution of the reversals of size 1 (n/2 on the average), 
which does not change the above asymptotics. 

Dividing by the average number of reversals of such a scenario (Theorem [3]) , we obtain Theo- 
rem [H 

□ 



5 Conclusion 

We showed that perfect sorting by reversals, although an intractable problem, is very likely to be 
solved in polynomial time for random signed permutations. This result relies on a study of the shape 
of a random strong interval tree that shows that asymptotically such trees are mostly composed of 
a large prime vertex at the root and small subtrees. As the strong interval tree of a permutation is 
equivalent to the modular decomposition tree of the corresponding labeled permutation graph [1], 
this result agrees with the general belief that the modular decomposition tree of a random graph 
has a large prime root. We were also able to give precise asymptotic results for the expected lengths 
of a parsimonious perfect scenario and of a reversal of such a scenario for random commuting 
permutations. 

Our research leaves at least one open problem: proving that computing a parsimonious perfect 
scenario can be done in polynomial time on the average. It would also be interesting to see if our 
approach can be extended to the perfect rearrangement problem for the Double-Cut-and-Join model 
that has been introduced recently [6] and has the intriguing property that instances that were hard 
to solve for reversals are can be solved in polynomial time in the DCJ context and conversely. 
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